Restructuring Arrays for E cient Parallel Loop
نویسنده
چکیده
In a sequential program, data are often structured in a way that is optimized for a sequential execution. However, when the program is parallelized, the data access pattern may change drastically. If the structure of the data is not changed accordingly, parallel performance will su er. In this paper, we consider this problem in the context of runtime loop parallelization [8, 9], which is a general technique to parallelize loops not amenable to compile-time analysis. In a parallel execution of a loop, iterations may be performed in a very di erent order than in the sequential execution. This may result in undesirable cache e ects on distributed shared-memory multiprocessors, unless the structure of the arrays accessed by these iterations is changed accordingly. We discuss what these problems are and how they arise. We then describe two data restructuring techniques to address them: the restructuring of read-write arrays to reduce inter-processor communication due to false sharing, and the restructuring of read-only arrays to improve spatial locality. We also report experiments on a KSR1 [3] to evaluate the e ectiveness of these techniques and the preprocessing and postprocessing overheads they entail. The results show that the restructuring techniques can substantially improve performance of the parallelized loop. When restructuring overheads are ignored, we see a doubling of parallel speedups. While restructuring overheads can be quite signi cant, they can often be amortized across multiple loop executions so that they do not outweigh the performance bene ts. In our experiments, it takes only two loop executions to achieve this.
منابع مشابه
Restructuring Arrays for E cient Parallel Loop Execution
In a sequential program, data are often structured in a way that is optimized for a sequential execution. However, when the program is parallelized, the data access pattern may change drastically. If the structure of the data is not changed accordingly, parallel performance will su er. In this paper, we consider this problem in the context of runtime loop parallelization [8, 9], which is a gene...
متن کاملOn the Optimum Directivity of Uniformly Spaced Broadside Arrays of Parallel Half-Wave Dipoles (RESEARCH NOTES)
The nominal directivity for uniformly spaced broadside parallel half-wave dipoles associated with a uniform excitation is evaluated. The amplitude distribution for an optimized directivity is then obtained for different numbers of elements with the separations between the dipoles as a variable. The optimum and nominal directivities are compared for different spacings of the elements. While thes...
متن کاملArray Restructuring for Cache Locality
Array Restructuring for Cache Locality by Shun-Tak Albert Leung Chairperson of Supervisory Committee: Professor John Zahorjan Department of Computer Science and Engineering Caches are used in almost every modern processor design to reduce the long memory access latency, which is increasingly a bottleneck to program performance. For caches to be effective, programs must exhibit good data localit...
متن کاملI Contents 1 Introduction 1 2 Monolithic Arrays 1 3 from Macs to Double Loops: an Example 4 4 Problem Formulation and Solution Strategy 6
Unimodular transformation has been proposed as a powerful techniques for loop parallelization in imperative language programs. In this paper, we propose to apply unimodular transformation to functional language programs. In particular, we propose a method of applying unimodular transformation to monolithic arrays such as Haskell array comprehensions. Using our method, a compiler can deduce a sa...
متن کاملExtending Gg Odel for Expressing Restricted Quantiications and Arrays
The expressiveness of the declarative language G odel can be improved by adding to it bounded quanti cations, i.e., quanti cations over nite domains, and arrays. Many problems can be expressed more concisely using bounded quanti cations than using recursion. Arrays are natural for many applications, e.g., in scienti c computing, and are conveniently used in bounded quanti cations. Treating bou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013